Computer Organization CH4 The Processor

4.1 intro

intro p3

performance

ISA is diff. -> instruction count diff.
- clock cycle and CPI diff.
- determined by processor

CH4

a more realistic pipeline versio

instruction execution overview p4

started from program counter(PC)
- 讀取程式的第一條指令的位置
- in modern architecture, L1 cache
read 1 or 2 registers
use ALU after reading the register
- find next inst. in memory address
  - arithmetic-logical inst.
  - memory addr. calculation
  - conditional branches
diff. by inst. classes
- read p5

前兩步驟一樣

simplicity and regularity of RISC-V inst. set
- simplify implementation

instruction execution flow

資料會被複製，給多個東西看
p6

cont’d

少了multiplexors
control unit
p7

4.2 logic design conventions

Logic design basics p10

state or not state

output depend only on the current input -> not state
output is related to state & input

edge trigger

will do something

4.3 building datapath

datapath

除了32個register，還有其他奇奇怪怪的register

p15

R-format ALU

Load/Store

p17
immediate會先進去
- 之後再執行inst.
- p18

branch

PC relative addres
- half word
p21

compose the ele.

use multiplexers

p24
p25
- 多了ALU control
- condition branch used

4.4 A simple implementation scheme

ISA vs. Hardware design

RISC-V

指令的format比較複雜，但layout繼承下來
- rd都在rd, rs都在rs
- 簡化了hardware的path設計
相對MIPS
- 位置會做變化，需要多一個multiplexor做判斷

Setting of the Main Control Unit

大概就是上一節的內容

opcode的長相
- using flow map to compare
- 圖很重要
focus on multiplexor

exxcution flow of R-format inst.

add x1, x2, x3

假設指令已經被載入pc
opcode會被送到main control
- 對所有控制訊號輸出
ALU op
- 10 r-type
  - 00 加法
  - 01 減法

alt text

execution flow of I-format inst.

lw x1, offset(x2)

ALU op
- 00 lw or sw
  - 00 加法
多了第四個步驟
- 造成執行時間變長

execution flow of branch inst.

beq x1, x2, offset

control & result decide next PC

logic of the control unit

opcode 和 control unit的對比
以執行時間最長的指令為一個clock period

4.6 an overview of pipeline

pipeline

sequential laundry

比較久
pipelined laundry
最大效率使用各種資源
沒有減少每個工作的時間
- 但盡量的使用了閒置的資源
- 最快就是比seq.快4倍

single cycle vs. pipeline performance

alt text

以最長的major function units為pipeline的cycle
- 沒有相依性的情況

insights of the design for pipeline execution

alt text

Hazards

structure
- 增加硬體資源可以解決
data
- pipeline會被stalled
control
- 存在branch
- 不知道要執行branch後的，還是PC+4

structure Hazard

假設只有1個memory，1個port
- 只能有一個人對memory存取

graphical repre. of the pipeline

黑色在右邊
- 讀取
黑色在左邊
- 寫入

data hazard

一個指令寫入的結果，是另外一個指令的來源

sol.

forwarding
- 利用硬體的方式 bypassing
- 把結果直接傳給下一個指令
- load-use data hazard
  - 需要額外使用bubble
Code scheduling
- 調整指令執行的順序

Control Hazard

sol.

stall
- 沒有硬體資源
  - 應該需要2次stall
prediction
- 用猜的
- static vs. dynamic
delayed branch
- branch 提早做

…

4.7 pipelined datapath and control

…

Pipeline control

alt text

pipeline control on five stage (step 2)

instruction fetch
instruction decode/reister file read
Execution/address calculation
- ALUop and ALUsrc
memory accedd
- control line set in this stage
  - Branch: 資料往後流
  - MemRead
  - MemWrite
write-back
- 資料往後流